CAPS: A Cross-genre Author Profiling System
نویسندگان
چکیده
This paper describes the participation of the Cross-genre Author Profiling System (CAPS) in the PAN16 shared task [15]. The classification system considers parts-of-speech, collocations, connective words and various other stylometric features to differentiate between the writing styles of male and female authors as well as between different age groups. The system achieves the second best score – 74.36% accuracy (with the best performing system (BPS) reaching 75.64%) for gender identification on the official test set (test set 2) for English. Further, for age classification, we report accuracy of 44.87% (BPS: 58.97%). For Spanish, CAPS reaches performance of 62.50% (BPS: 73.21%) for gender and 46.43% (BPS: 51.79) for age, while for Dutch, the accuracy for gender (the task did not target age) is lowest – 55.00% (BPS: 61.80%). For comparison, we also tested CAPS on single genre classification of author gender and age on the PAN14 and PAN15 datasets achieving comparable performance.
منابع مشابه
Exploring the Effects of Cross-Genre Machine Learning for Author Profiling in PAN 2016
Author profiling deals with the study of various profile dimensions of an author such as age and gender. This work describes our methodology proposed for the task of cross-genre author profiling at PAN 2016. We address gender and age prediction as a classification task and approach this problem by extracting stylistic and lexical features for training a logistic regression model. Furthermore, w...
متن کاملCross-Genre Age and Gender Identification in Social Media
This paper gives a brief description on the methods adopted for the task of author-profiling as part of the competition PAN 2016 [1]. Author profiling is the task of predicting the author’s age and gender from his/her writing. In this paper, we follow a two-level ensemble approach to tackle the cross-genre author profiling task where training documents and testing documents are from different g...
متن کاملOverview of the 4th Author Profiling Task at PAN 2016: Cross-Genre Evaluations
This overview presents the framework and the results of the Author Profiling task at PAN 2016. The objective was to predict age and gender from a cross-genre perspective. For this purpose a corpus from Twitter has been provided for training, and different corpora from social media, blogs, essays, and reviews have been provided for evaluation. Altogether, the approaches of 22 participants were e...
متن کاملProfiling Microblog Authors using Concreteness and Sentiment - Know-Center at PAN 2016 Author Profiling
The PAN 2016 author profiling task is a supervised classification problem on cross-genre documents (tweets, blog and social media posts). Our system makes use of concreteness, sentiment and syntactic information present in the documents. We train a random forest model to identify gender and age of a document’s author. We report the evaluation results received by the shared task.
متن کاملCross-Genre Author Profile Prediction Using Stylometry-Based Approach
Author profiling task aims to identify different traits of an author by analyzing his/her written text. This study presents a Stylometry-based approach for detection of author traits (gender and age) for cross-genre author profiles. In our proposed approach, we used different types of stylistic features including 7 lexical features, 16 syntactic features, 26 character-based features and 6 vocab...
متن کامل